The objective for this project is to find the evidance for global warming. From National Geographic’s What is global warming, explained page (https://www.nationalgeographic.com/environment/global-warming/global-warming-overview/), it is said that raising average temperature and increasing occurrences of extreme temperature are two significant factors that show global warming. So to prove that global warming is true, we are going to analyze the trend of temperature changes and the extreme values.
Firstly, we import the data we are going use to R. Secondly, we build linear regression models and make inference for each model’s coefficient of time. Thirdly, we plot the temperature distribution for every month. Lastly, we make conclusion based on the analysis.
The data we use is NOAA’s Boston’s air and water temperature from 1987 to 2016. There are 246245 observations in total. For better comparison, we remove the NA values and use the temperature at 12pm everyday. As a result, we have 10146 observations for analyzing. The year, month, day, hour, air temperature and water temperature are stored in the data frame as string. For later analyzing, we change the data types for air and water temperature to double. We create a new DATE column combining the year, month, day and hour columns into seconds from 1970/1/1 form.
load("MR.Rdata")
`%notin%` <- Negate(`%in%`)
MRsub <- filter(MR, ATMP%notin%c(999,99,"999","99","999.0","99.0") & WTMP%notin%c(999,99,"999","99","999.0","99.0") & hh=="12")
MRsub$ATMP <- as.double(MRsub$ATMP)
MRsub$WTMP <- as.double(MRsub$WTMP)
buoyTimeStr <- paste(MRsub$YYYY, MRsub$MM, MRsub$DD, MRsub$hh, sep="-")
MRsub$TIME <- ymd_h(buoyTimeStr)
MRsub$DATE <- unclass(MRsub$TIME)
MRsub$YYYY <- as.double(MRsub$YYYY)
Since the temperature varies during different months, we decide to build linear regression models for each month respectively and for annual average.
numMonths <- levels(factor(MRsub$MM))
nameMonths <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
fitA_list <- list()
fitW_list <- list()
for(i in 1:12){
fitA_list[[i]] <- lm(ATMP~DATE, data=MRsub[MRsub$MM==numMonths[i],])
fitW_list[[i]] <- lm(WTMP~DATE, data=MRsub[MRsub$MM==numMonths[i],])
}
for(i in 1:12){
print(paste("Coefficient of DATE for ", nameMonths[i],"'s ATMP & WTMP: ",as.character(coef(fitA_list[[i]])[2]), " & ", as.character(coef(fitW_list[[i]])[2]),". By average, every year the air temperature of this month changes ", as.character(60*60*24*365*coef(fitA_list[[i]])[2]), " celcius degrees, and the water temperature changes ",as.character(60*60*24*365*coef(fitW_list[[i]])[2]), "celcius degrees.", sep=""))
}
## [1] "Coefficient of DATE for Jan's ATMP & WTMP: -1.63427898906277e-09 & -1.6954589835446e-10. By average, every year the air temperature of this month changes -0.0515386221990835 celcius degrees, and the water temperature changes -0.00534679945050626celcius degrees."
## [1] "Coefficient of DATE for Feb's ATMP & WTMP: -3.26152998965013e-10 & -8.53643501945669e-11. By average, every year the air temperature of this month changes -0.0102855609753606 celcius degrees, and the water temperature changes -0.00269205014773586celcius degrees."
## [1] "Coefficient of DATE for Mar's ATMP & WTMP: 5.9253444863983e-11 & 4.29831785118652e-10. By average, every year the air temperature of this month changes 0.00186861663723057 celcius degrees, and the water temperature changes 0.0135551751755018celcius degrees."
## [1] "Coefficient of DATE for Apr's ATMP & WTMP: 6.37549620314097e-10 & 3.7143885901697e-10. By average, every year the air temperature of this month changes 0.0201057648262254 celcius degrees, and the water temperature changes 0.0117136958579592celcius degrees."
## [1] "Coefficient of DATE for May's ATMP & WTMP: 3.23287721437907e-10 & 3.41624549231906e-10. By average, every year the air temperature of this month changes 0.0101952015832658 celcius degrees, and the water temperature changes 0.0107734717845774celcius degrees."
## [1] "Coefficient of DATE for Jun's ATMP & WTMP: 3.52881416573906e-10 & 1.27886381258972e-10. By average, every year the air temperature of this month changes 0.0111284683530747 celcius degrees, and the water temperature changes 0.00403302491938294celcius degrees."
## [1] "Coefficient of DATE for Jul's ATMP & WTMP: 2.62092737847569e-10 & -1.92985772601081e-11. By average, every year the air temperature of this month changes 0.00826535658076093 celcius degrees, and the water temperature changes -0.000608599932474769celcius degrees."
## [1] "Coefficient of DATE for Aug's ATMP & WTMP: 5.64059225025692e-10 & 4.82843450189986e-10. By average, every year the air temperature of this month changes 0.0177881717204102 celcius degrees, and the water temperature changes 0.0152269510451914celcius degrees."
## [1] "Coefficient of DATE for Sep's ATMP & WTMP: 7.26705397394259e-11 & 2.85699559173011e-10. By average, every year the air temperature of this month changes 0.00229173814122253 celcius degrees, and the water temperature changes 0.00900982129808006celcius degrees."
## [1] "Coefficient of DATE for Oct's ATMP & WTMP: 3.2018676420886e-10 & 1.84767883931108e-10. By average, every year the air temperature of this month changes 0.0100974097960906 celcius degrees, and the water temperature changes 0.00582683998765142celcius degrees."
## [1] "Coefficient of DATE for Nov's ATMP & WTMP: -8.22150262900542e-10 & 1.5903033090485e-10. By average, every year the air temperature of this month changes -0.0259273306908315 celcius degrees, and the water temperature changes 0.00501518051541535celcius degrees."
## [1] "Coefficient of DATE for Dec's ATMP & WTMP: 1.83495605392501e-09 & 4.96921779883005e-10. By average, every year the air temperature of this month changes 0.0578671741165791 celcius degrees, and the water temperature changes 0.0156709252503904celcius degrees."
plotFit <- ggplot(data=MRsub)+
geom_point(aes(x=DATE,y=ATMP),alpha=0.1,color="black")+
geom_point(aes(x=DATE,y=WTMP),alpha=0.1, color = "#56B4E9")+
geom_smooth(aes(x=DATE,y=ATMP,color="ATMP"),method="lm")+
geom_smooth(aes(x=DATE,y=WTMP,color="WTMP"),method="lm")+
scale_color_manual("",breaks=c("ATMP","WTMP"),values=c("black","blue"))+
ggtitle("Temperature vs. Time")+
xlab("Seconds from 1970/1/1")+
ylab("Tempurature")+
facet_wrap(~factor(MM))
plotFit
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
head(MRsub)
## # A tibble: 6 x 18
## YYYY MM DD hh WD WSPD GSP WVHT DPD APD MWD BAR ATMP
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 1987 12 04 12 275 06.2 06.9 99 99 99 999 1018.6 20
## 2 1987 12 05 12 006 07.3 07.9 99 99 99 999 1018.9 15.8
## 3 1987 12 06 12 006 08.4 09.3 99 99 99 999 1020.8 15.5
## 4 1987 12 07 12 095 08.3 09.1 99 99 99 999 1019.4 23
## 5 1987 12 08 12 087 12.1 13.6 99 99 99 999 1016.9 23.9
## 6 1987 12 09 12 293 04.6 05.3 99 99 99 999 1014.7 21
## # … with 5 more variables: WTMP <dbl>, DEWP <chr>, VIS <chr>, TIME <dttm>,
## # DATE <dbl>
avg <- MRsub%>%group_by(YYYY)%>%summarize(meanATMP = mean(ATMP), meanWTMP = mean(WTMP))
## `summarise()` ungrouping output (override with `.groups` argument)
plotAnnualAvg <- ggplot(data=avg)+
geom_point(aes(x=YYYY, y=meanATMP), alpha=0.1, color="black")+
geom_point(aes(x=YYYY, y=meanWTMP), alpha=0.1, color="#56B4E9")+
geom_smooth(aes(x=YYYY,y=meanATMP,color="ATMP"),method="lm")+
geom_smooth(aes(x=YYYY,y=meanWTMP,color="WTMP"),method="lm")+
scale_color_manual("",breaks=c("ATMP","WTMP"),values=c("black","blue"))+
ggtitle("Annual Average Temperature vs. Year")+
xlab("Year")+
ylab("Annual Average Tempurature")
plotAnnualAvg
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
fitA_annual <- lm(meanATMP~YYYY, data=avg)
fitW_annual <- lm(meanWTMP~YYYY, data=avg)
print(paste("By average, the annual average air temperature increases ", as.character(coef(fitA_annual)[2]), " celcius degrees and the annual average water temperature increases ", as.character(coef(fitW_annual)[2]), " celcius degrees every year. ", sep=""))
## [1] "By average, the annual average air temperature increases 0.0182281750720344 celcius degrees and the annual average water temperature increases 0.0126780717416379 celcius degrees every year. "
The coefficients of the linear regression models show the variation of monthly average air temperature and water temperature from 1987 to 2016. In January and February, both the air temperature and water temperature decrease. In March, April, May, June, August, September, October and December, both the air temperature and water temperature increase. In July, the air temperature increases while the water temperature decreases. In November, the air temperature decreases while the water temperature increases.
MR_Jan <- filter(MRsub, MM == "01")
MR_Feb <- filter(MRsub, MM == "02")
MR_Mar <- filter(MRsub, MM == "03")
MR_Apr <- filter(MRsub, MM == "04")
MR_May <- filter(MRsub, MM == "05")
MR_Jun <- filter(MRsub, MM == "06")
MR_Jul <- filter(MRsub, MM == "07")
MR_Aug <- filter(MRsub, MM == "08")
MR_Sep <- filter(MRsub, MM == "09")
MR_Oct <- filter(MRsub, MM == "10")
MR_Nov <- filter(MRsub, MM == "11")
MR_Dec <- filter(MRsub, MM == "12")
plotADist_Jan <- ggplot(MR_Jan,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("January's Air Temperature Distribution")
plotWDist_Jan <- ggplot(MR_Jan,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("January's Water Temperature Distribution")
grid.arrange(plotADist_Jan, plotWDist_Jan, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_Feb <- ggplot(MR_Feb,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("February’s Air Temperature Distribution")
plotWDist_Feb <- ggplot(MR_Feb,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("February’s Water Temperature Distribution")
grid.arrange(plotADist_Feb, plotWDist_Feb, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_Mar <- ggplot(MR_Mar,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("March’s Air Temperature Distribution")
plotWDist_Mar <- ggplot(MR_Mar,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("March’s Water Temperature Distribution")
grid.arrange(plotADist_Mar, plotWDist_Mar, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_Apr <- ggplot(MR_Apr,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("April’s Air Temperature Distribution")
plotWDist_Apr <- ggplot(MR_Apr,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("April’s Water Temperature Distribution")
grid.arrange(plotADist_Apr, plotWDist_Apr, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_May <- ggplot(MR_May,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("May’s Air Temperature Distribution")
plotWDist_May <- ggplot(MR_May,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("May’s Water Temperature Distribution")
grid.arrange(plotADist_May, plotWDist_May, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_Jun <- ggplot(MR_Jun,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("June’s Air Temperature Distribution")
plotWDist_Jun <- ggplot(MR_Jun,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("June’s Water Temperature Distribution")
grid.arrange(plotADist_Jun, plotWDist_Jun, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_Jul <- ggplot(MR_Jul,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("July’s Air Temperature Distribution")
plotWDist_Jul <- ggplot(MR_Jul,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("July’s Water Temperature Distribution")
grid.arrange(plotADist_Jul, plotWDist_Jul, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_Aug <- ggplot(MR_Aug,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("August’s Air Temperature Distribution")
plotWDist_Aug <- ggplot(MR_Aug,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("August’s Water Temperature Distribution")
grid.arrange(plotADist_Aug, plotWDist_Aug, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_Sep <- ggplot(MR_Sep,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("September’s Air Temperature Distribution")
plotWDist_Sep <- ggplot(MR_Sep,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("September’s Water Temperature Distribution")
grid.arrange(plotADist_Sep, plotWDist_Sep, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_Oct <- ggplot(MR_Oct,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("October’s Air Temperature Distribution")
plotWDist_Oct <- ggplot(MR_Oct,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("October’s Water Temperature Distribution")
grid.arrange(plotADist_Oct, plotWDist_Oct, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_Nov <- ggplot(MR_Nov,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("November’s Air Temperature Distribution")
plotWDist_Nov <- ggplot(MR_Nov,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("November’s Water Temperature Distribution")
grid.arrange(plotADist_Nov, plotWDist_Nov, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plotADist_Dec <- ggplot(MR_Dec,aes(x=ATMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("December’s Air Temperature Distribution")
plotWDist_Dec <- ggplot(MR_Dec,aes(x=WTMP))+geom_histogram()+facet_wrap(~factor(YYYY))+ggtitle("December’s Water Temperature Distribution")
grid.arrange(plotADist_Dec, plotWDist_Dec, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The distribution plots do not show a obvious shift in frequency of extreme temperatures.
Since the frequency of extreme temperatures does not change largely, we are going to focus on the linear regression models. Although air temperature and water temperature decreases in some months, the overall temperature is getting higher and higher every year. Besides, decreasing temperature in winter is a signal of extreme climate change. Therefore, we could prove that the global warming is true.